Data Visualization#

Latest update: 2024-09-01

Load data#

Hide code cell source
import pandas as pd
import sys
sys.path.append('../')
from source.bokeh_plots import *
from source.data_visualization import *
output_notebook()

file_path = '../data/'
model_name = 'AML Epigenomic Risk'

# Read the data
df = pd.read_excel(file_path + 'alma_main_results.xlsx', index_col=0).sort_index()
sig_results = pd.read_excel(file_path + 'signature_results.xlsx', index_col=0).sort_index()

df = df.join(sig_results)

# Define train and test samples
df_train = df[df['Train-Test']=='Train Sample']
df_test = df[df['Train-Test'] == 'Test Sample']

# Prognostic model samples
df_px = df[~df['Vital Status at 5y'].isna()]
df_px2 = df_px[df_px['Clinical Trial'].isin(['AAML0531', 'AAML1031', 'AAML03P1'])]
df_px2 = df_px2[df_px2['Sample Type'].isin(
    ['Diagnosis', 'Primary Blood Derived Cancer - Bone Marrow', 'Primary Blood Derived Cancer - Peripheral Blood'])]
df_px2 = df_px2[~df_px2['Patient_ID'].duplicated(keep='last')]

# drop the samples with missing labels for the ELN AML 2022 Diagnosis
df_dx = df_train[~df_train['WHO 2022 Diagnosis'].isna()]

# exclude the classes with fewer than 5 samples
df_dx = df_dx[~df_dx['WHO 2022 Diagnosis'].isin(['AML with t(9;22); BCR::ABL1'])]


df_px_ = df_px.sort_values(by='P(Death) at 5y').reset_index().reset_index(names=['Percentile']).set_index('index')
df_px_['Percentile'] = df_px_['Percentile'] / len(df_px_['Percentile'])
df2 = df.join(df_px_[['Percentile']])
Loading BokehJS ...

Interactive atlas#

Hide code cell source
plot_linked_scatters(df, table=False)

Patient Characteristics#

ALMA (unsupervised)#

Hide code cell source
from tableone import TableOne
from datetime import date

columns = ['Hematopoietic Entity','Age (group years)','Sex',
            'Clinical Trial',]

mytable_cog = TableOne(df_train.reset_index(), columns,
                        overall=False, missing=False,
                        pval=False, pval_adjust=False,
                        htest_name=True,dip_test=True,
                        tukey_test=True, normal_test=True,

                        order={'FLT3 ITD':['Yes','No'],
                                'Age (group years)':['0-5','5-13','13-39','39-60'],
                                'MRD 1 Status': ['Positive'],
                                'Risk Group': ['High Risk', 'Standard Risk'],
                                'FLT3 ITD': ['Yes'],
                                'Leucocyte counts (10⁹/L)': ['≥30'],
                                'Age group (years)': ['≥10']})

mytable_cog.to_excel('../data/pt_characteristics_alma_model_' + str(date.today()) +'.xlsx')

mytable_cog.tabulate(tablefmt="html", 
                        # headers=[score_name,"",'Missing','Discovery','Validation','p-value','Statistical Test']
                        )
Hide code cell output
Overall
n 3314
Hematopoietic Entity, n (%)Acute lymphoblastic leukemia (ALL) 700 (28.3)
Acute myeloid leukemia (AML) 1213 (49.1)
Acute promyelocytic leukemia (APL) 31 (1.3)
Mixed phenotype acute leukemia (MPAL) 50 (2.0)
Myelodysplastic syndrome (MDS or MDS-like)225 (9.1)
Otherwise-Normal (Control) 251 (10.2)
Age (group years), n (%) 0-5 480 (24.1)
5-13 483 (24.2)
13-39 663 (33.2)
39-60 165 (8.3)
60+ 203 (10.2)
Sex, n (%) Female 885 (49.1)
Male 918 (50.9)
Clinical Trial, n (%) AAML03P1 72 (2.2)
AAML0531 628 (19.2)
AAML1031 587 (17.9)
Beat AML Consortium 316 (9.7)
CCG2961 41 (1.3)
CETLAM SMD-09 (MDS-tAML) 166 (5.1)
French GRAALL 2003–2005 141 (4.3)
Japanese AML05 64 (2.0)
NOPHO ALL92-2000 933 (28.5)
TARGET ALL 131 (4.0)
TCGA AML 194 (5.9)

Fine-tuned (supervised) Dx and Px models#

Hide code cell source
columns = ['Age (years)','Age group (years)','Sex','Race or ethnic group',
            'Hispanic or Latino ethnic group', 'MRD 1 Status',
            'Leucocyte counts (10⁹/L)', 'BM leukemic blasts (%)',
            'Risk Group','FLT3 ITD', 'Clinical Trial']

df_test['Age (years)'] = df_test['Age (years)'].astype(float)

# join discovery clinical data with validation clinical data
all_cohorts = pd.concat([df_dx, df_px2, df_test],
                         axis=0, keys=['Dx Discovery','Px Discovery' ,'Validation'],
                         names=['cohort']).reset_index()

# columns = ['Age group (years)','Sex', 'MRD 1 Status',
#             'Leucocyte counts (10⁹/L)',
#             'Risk Group','FLT3 ITD', 'Treatment Arm','Clinical Trial']

mytable_cog = TableOne(all_cohorts, columns,
                        overall=False, missing=False,
                        pval=False, pval_adjust=False,
                        htest_name=True,dip_test=True,
                        tukey_test=True, normal_test=True,

                        order={'FLT3 ITD':['Yes','No'],
                                'Race or ethnic group':['White','Black or African American','Asian'],
                                'MRD 1 Status': ['Positive'],
                                'Risk Group': ['High Risk', 'Standard Risk'],
                                'FLT3 ITD': ['Yes'],
                                'Leucocyte counts (10⁹/L)': ['≥30'],
                                'Age group (years)': ['≥10']},
                                groupby='cohort')

mytable_cog.to_excel('../data/pt_characteristics_fine-tuned_models_' + str(date.today()) +'.xlsx')

mytable_cog.tabulate(tablefmt="html", 
                        # headers=[score_name,"",score_name,'Validation','p-value','Statistical Test']
)
Hide code cell output
Dx Discovery Px Discovery Validation
n 2467 946 201
Age (years), mean (SD) 19.2 (19.7) 9.4 (6.3) 8.8 (6.0)
Age group (years), n (%) ≥10 528 (47.5) 463 (48.9) 95 (47.7)
<10 584 (52.5) 483 (51.1) 104 (52.3)
Sex, n (%) Female 707 (50.4) 468 (49.5) 87 (43.3)
Male 697 (49.6) 478 (50.5) 114 (56.7)
Race or ethnic group, n (%) White 1061 (80.5) 697 (79.1) 143 (71.9)
Black or African American 131 (9.9) 102 (11.6) 32 (16.1)
Asian 65 (4.9) 43 (4.9) 1 (0.5)
American Indian or Alaska Native 7 (0.5) 5 (0.6)
Native Hawaiian or other Pacific Islander7 (0.5) 6 (0.7) 2 (1.0)
Other 47 (3.6) 28 (3.2) 21 (10.6)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 207 (19.4) 185 (20.2) 25 (12.6)
Not Hispanic or Latino 858 (80.6) 731 (79.8) 174 (87.4)
MRD 1 Status, n (%) Positive 284 (29.7) 260 (31.5) 76 (40.2)
Negative 673 (70.3) 566 (68.5) 113 (59.8)
Leucocyte counts (10⁹/L), n (%) ≥30 579 (52.5) 467 (49.4) 88 (44.0)
<30 524 (47.5) 479 (50.6) 112 (56.0)
BM leukemic blasts (%), mean (SD) 65.7 (24.1) 63.8 (24.5) 60.0 (25.6)
Risk Group, n (%) High Risk 196 (14.1) 129 (13.8) 51 (25.4)
Standard Risk 626 (45.0) 454 (48.7) 87 (43.3)
Low Risk 570 (40.9) 349 (37.4) 63 (31.3)
FLT3 ITD, n (%) Yes 180 (16.2) 165 (17.5) 31 (15.6)
No 930 (83.8) 779 (82.5) 168 (84.4)
Clinical Trial, n (%) AAML03P1 62 (2.6) 36 (3.8)
AAML0531 515 (21.2) 507 (53.6)
AAML1031 495 (20.4) 403 (42.6)
Beat AML Consortium 192 (7.9)
CCG2961 31 (1.3)
CETLAM SMD-09 (MDS-tAML) 166 (6.8)
French GRAALL 2003–2005 141 (5.8)
Japanese AML05 9 (0.4)
NOPHO ALL92-2000 641 (26.4)
TARGET ALL 56 (2.3)
TCGA AML 118 (4.9)
AML02 159 (79.1)
AML08 42 (20.9)

By prognostic group#

Discovery#

AML Epigenomic Risk

Hide code cell source
def pt_characteristics_by_model(df, model_name, traintest = 'discovery'):
        columns = ['Age (years)','Age group (years)','Sex','Race or ethnic group',
                'Hispanic or Latino ethnic group', 'MRD 1 Status',
                'Leucocyte counts (10⁹/L)', 'BM leukemic blasts (%)',
                'Risk Group', 'Clinical Trial','FLT3 ITD', 'Treatment Arm']

        mytable_cog = TableOne(df, columns,
                                overall=False, missing=False,
                                pval=True, pval_adjust=False,
                                htest_name=True,dip_test=True,
                                tukey_test=True, normal_test=True,

                                order={'FLT3 ITD':['Yes','No'],
                                        'Race or ethnic group':['White','Black or African American','Asian'],
                                        'MRD 1 Status': ['Positive'],
                                        'Risk Group': ['High Risk', 'Standard Risk'],
                                        'FLT3 ITD': ['Yes'],
                                        'Leucocyte counts (10⁹/L)': ['≥30'],
                                        'Age group (years)': ['≥10']},
                                groupby=model_name)

        mytable_cog.to_excel('../data/pt_characteristics_'+ model_name +'_' + traintest + '_' + str(date.today()) + '.xlsx')

        return(mytable_cog.tabulate(tablefmt="html", 
                                headers=[model_name + ' ' + traintest,"",'High','Low','p-value','Statistical Test']))

pt_characteristics_by_model(df_px2, model_name, 'Discovery')
Hide code cell output
AML Epigenomic Risk Discovery High Low p-value Statistical Test
n 442 504
Age (years), mean (SD) 8.6 (6.5) 10.2 (6.1) <0.001 Two Sample T-test
Age group (years), n (%) ≥10 192 (43.4) 271 (53.8) 0.002 Chi-squared
<10 250 (56.6) 233 (46.2)
Sex, n (%) Female 222 (50.2) 246 (48.8) 0.712 Chi-squared
Male 220 (49.8) 258 (51.2)
Race or ethnic group, n (%) White 324 (78.3) 373 (79.9) 0.676 Chi-squared (warning: expected count < 5)
Black or African American 53 (12.8) 49 (10.5)
Asian 19 (4.6) 24 (5.1)
American Indian or Alaska Native 3 (0.7) 2 (0.4)
Native Hawaiian or other Pacific Islander4 (1.0) 2 (0.4)
Other 11 (2.7) 17 (3.6)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 84 (19.6) 101 (20.7) 0.724 Chi-squared
Not Hispanic or Latino 345 (80.4) 386 (79.3)
MRD 1 Status, n (%) Positive 159 (41.3) 101 (22.9) <0.001 Chi-squared
Negative 226 (58.7) 340 (77.1)
Leucocyte counts (10⁹/L), n (%) ≥30 190 (43.0) 277 (55.0) <0.001 Chi-squared
<30 252 (57.0) 227 (45.0)
BM leukemic blasts (%), mean (SD) 65.6 (26.4)62.2 (22.7)0.043 Two Sample T-test
Risk Group, n (%) High Risk 84 (19.4) 45 (9.0) <0.001 Chi-squared
Standard Risk 317 (73.4) 137 (27.4)
Low Risk 31 (7.2) 318 (63.6)
Clinical Trial, n (%) AAML03P1 21 (4.8) 15 (3.0) 0.110 Chi-squared
AAML0531 223 (50.5) 284 (56.3)
AAML1031 198 (44.8) 205 (40.7)
FLT3 ITD, n (%) Yes 85 (19.3) 80 (15.9) 0.203 Chi-squared
No 356 (80.7) 423 (84.1)
Treatment Arm, n (%) Arm A 109 (44.7) 149 (50.0) 0.250 Chi-squared
Arm B 135 (55.3) 149 (50.0)

MethylScoreAML-37CpGs

Hide code cell source
pt_characteristics_by_model(df_px2, model_name='MethylScoreAML Categorical', traintest='Discovery')
Hide code cell output
MethylScoreAML Categorical Discovery High Low p-value Statistical Test
n 176 770
Age (years), mean (SD) 9.2 (6.5) 9.5 (6.3) 0.666 Two Sample T-test
Age group (years), n (%) ≥10 88 (50.0) 375 (48.7) 0.820 Chi-squared
<10 88 (50.0) 395 (51.3)
Sex, n (%) Female 86 (48.9) 382 (49.6) 0.924 Chi-squared
Male 90 (51.1) 388 (50.4)
Race or ethnic group, n (%) White 131 (79.4) 566 (79.1) 0.138 Chi-squared (warning: expected count < 5)
Black or African American 26 (15.8) 76 (10.6)
Asian 5 (3.0) 38 (5.3)
American Indian or Alaska Native 1 (0.6) 4 (0.6)
Other 2 (1.2) 26 (3.6)
Native Hawaiian or other Pacific Islander 6 (0.8)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 34 (20.1) 151 (20.2) 1.000 Chi-squared
Not Hispanic or Latino 135 (79.9) 596 (79.8)
MRD 1 Status, n (%) Positive 64 (43.5) 196 (28.9) 0.001 Chi-squared
Negative 83 (56.5) 483 (71.1)
Leucocyte counts (10⁹/L), n (%) ≥30 82 (46.6) 385 (50.0) 0.464 Chi-squared
<30 94 (53.4) 385 (50.0)
BM leukemic blasts (%), mean (SD) 72.5 (21.8)61.8 (24.7)<0.001 Two Sample T-test
Risk Group, n (%) High Risk 31 (17.9) 98 (12.9) <0.001 Chi-squared
Standard Risk 132 (76.3) 322 (42.4)
Low Risk 10 (5.8) 339 (44.7)
Clinical Trial, n (%) AAML03P1 6 (3.4) 30 (3.9) 0.729 Chi-squared
AAML0531 99 (56.2) 408 (53.0)
AAML1031 71 (40.3) 332 (43.1)
FLT3 ITD, n (%) Yes 27 (15.4) 138 (17.9) 0.496 Chi-squared
No 148 (84.6) 631 (82.1)
Treatment Arm, n (%) Arm A 56 (53.3) 202 (46.2) 0.230 Chi-squared
Arm B 49 (46.7) 235 (53.8)

Validation#

AML Epigenomic Risk

Hide code cell source
pt_characteristics_by_model(df_test, model_name, 'validation')
Hide code cell output
AML Epigenomic Risk validation High Low p-value Statistical Test
n 82 119
Age (years), mean (SD) 7.5 (6.1) 9.6 (5.8) 0.013 Two Sample T-test
Age group (years), n (%) ≥10 30 (37.0) 65 (55.1) 0.018 Chi-squared
<10 51 (63.0) 53 (44.9)
Sex, n (%) Female 36 (43.9) 51 (42.9) 0.998 Chi-squared
Male 46 (56.1) 68 (57.1)
Race or ethnic group, n (%) White 59 (73.8) 84 (70.6) 0.589 Chi-squared (warning: expected count < 5)
Black or African American 13 (16.2) 19 (16.0)
Asian 1 (1.2)
Native Hawaiian or other Pacific Islander1 (1.2) 1 (0.8)
Other 6 (7.5) 15 (12.6)
Hispanic or Latino ethnic group, n (%)Hispanic or Latino 11 (13.6) 14 (11.9) 0.888 Chi-squared
Not Hispanic or Latino 70 (86.4) 104 (88.1)
MRD 1 Status, n (%) Positive 34 (44.2) 42 (37.5) 0.444 Chi-squared
Negative 43 (55.8) 70 (62.5)
Leucocyte counts (10⁹/L), n (%) ≥30 33 (40.7) 55 (46.2) 0.535 Chi-squared
<30 48 (59.3) 64 (53.8)
BM leukemic blasts (%), mean (SD) 64.8 (27.0)56.9 (24.3)0.049 Two Sample T-test
Risk Group, n (%) High Risk 31 (37.8) 20 (16.8) <0.001 Chi-squared
Standard Risk 44 (53.7) 43 (36.1)
Low Risk 7 (8.5) 56 (47.1)
Clinical Trial, n (%) AML02 67 (81.7) 92 (77.3) 0.564 Chi-squared
AML08 15 (18.3) 27 (22.7)
FLT3 ITD, n (%) Yes 14 (17.3) 17 (14.4) 0.726 Chi-squared
No 67 (82.7) 101 (85.6)
Treatment Arm, n (%) Arm A 44 (55.0) 63 (52.9) 0.888 Chi-squared
Arm B 36 (45.0) 56 (47.1)

MethylScoreAML-37CpGs

Hide code cell source
pt_characteristics_by_model(df_test, model_name='MethylScoreAML Categorical', traintest='Validation')
Hide code cell output
MethylScoreAML Categorical Validation High Low p-value Statistical Test
n 48 153
Age (years), mean (SD) 7.8 (6.4) 9.1 (5.8) 0.207 Two Sample T-test
Age group (years), n (%) ≥10 20 (42.6) 75 (49.3) 0.517 Chi-squared
<10 27 (57.4) 77 (50.7)
Sex, n (%) Female 25 (52.1) 62 (40.5) 0.214 Chi-squared
Male 23 (47.9) 91 (59.5)
Race or ethnic group, n (%) White 35 (74.5) 108 (71.1) 0.171 Chi-squared (warning: expected count < 5)
Black or African American 8 (17.0) 24 (15.8)
Asian 1 (2.1)
Native Hawaiian or other Pacific Islander1 (2.1) 1 (0.7)
Other 2 (4.3) 19 (12.5)
Hispanic or Latino ethnic group, n (%) Hispanic or Latino 10 (21.3) 15 (9.9) 0.070 Chi-squared
Not Hispanic or Latino 37 (78.7) 137 (90.1)
MRD 1 Status, n (%) Positive 19 (41.3) 57 (39.9) 0.999 Chi-squared
Negative 27 (58.7) 86 (60.1)
Leucocyte counts (10⁹/L), n (%) ≥30 24 (51.1) 64 (41.8) 0.343 Chi-squared
<30 23 (48.9) 89 (58.2)
BM leukemic blasts (%), mean (SD) 71.2 (23.8)56.8 (25.3)0.001 Two Sample T-test
Risk Group, n (%) High Risk 10 (20.8) 41 (26.8) 0.002 Chi-squared
Standard Risk 31 (64.6) 56 (36.6)
Low Risk 7 (14.6) 56 (36.6)
Clinical Trial, n (%) AML02 34 (70.8) 125 (81.7) 0.158 Chi-squared
AML08 14 (29.2) 28 (18.3)
FLT3 ITD, n (%) Yes 5 (10.6) 26 (17.1) 0.402 Chi-squared
No 42 (89.4) 126 (82.9)
Treatment Arm, n (%) Arm A 24 (51.1) 83 (54.6) 0.796 Chi-squared
Arm B 23 (48.9) 69 (45.4)

Kaplan-Meier Plots#

Overall study population#

AML Epigenomic Risk

Hide code cell source
for dataset, trial in zip([df_px2, df_test], 
                          ['Discovery', 'Validation']):
    draw_kaplan_meier(model_name=model_name,
                        df=dataset,
                        save_survival_table=False,
                        save_plot=False,
                        show_ci=False,
                        add_risk_counts=False,
                        trialname=trial,
                        figsize=(8,8))
Hide code cell output
../_images/c08ecd6fc69ddfeae1730799d0b48bff86e0296036359f5c47747927da77a389.png ../_images/1979ab7d2d64b1b84c273dac2d247b83349b6ddac63e3f1ed8a2e50919a92705.png

MethylScoreAML-37CpGs

Hide code cell source
for dataset, trial in zip([df_px2, df_test], 
                          ['Discovery', 'Validation']):
    draw_kaplan_meier(model_name='MethylScoreAML Categorical',
                        df=dataset,
                        save_survival_table=False,
                        save_plot=False,
                        show_ci=False,
                        add_risk_counts=False,
                        trialname=trial,
                        figsize=(8,8))
Hide code cell output
../_images/cbf06778e30fcc257aa6e422a08f1d51b14019a31f3f76400d982a4f5abb3f6d.png ../_images/9c5b15f1eb0a11533aa6931812a9ff4c76c7e57b13fae6bce1c5cddff03c217b.png

Per risk group#

AML Epigenomic Risk

Hide code cell source
for dataset, trial in zip([df_px2, df_test], ['Discovery', 'Validation']):

    risk_groups = ['High Risk', 'Low Risk', 'Standard Risk']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name=model_name,
            df=dataset[dataset['Risk Group'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group}',
            figsize=(8, 8))
Hide code cell output
../_images/991fbc97140c0776893a2e474e3184d852e21b7e28d22bc2be89564305cf6a86.png ../_images/8f6054bc06dd88fe8d1370ef2aff6bca53b0a27b183874175db6085d6ae87cfd.png ../_images/bccca26906336d9ae956d83a66c851516baf5af0ce6c08fa5fb128821508da68.png ../_images/c0259db2ceca9b9f6d6b5fffcfabc17fb4bc2eb917f2690cfc4008f0c50c6e41.png ../_images/1068e71916ff077ab6bc0a362bc19fb55fcdc50614a669e7c2f5d22dcbd017f6.png ../_images/a04949e5bc5e490193a9903052c69dd89ece47df016a1a866af41010899b6505.png

MethylScoreAML-37CpGs

Hide code cell source
for dataset, trial in zip([df_px2, df_test], ['Discovery', 'Validation']):

    risk_groups = ['High Risk', 'Low Risk', 'Standard Risk']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name= 'MethylScoreAML Categorical',
            df=dataset[dataset['Risk Group'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group}',
            figsize=(8, 8))
Hide code cell output
../_images/aa0d5633bb66f0253c770084583ec9e9d29d85431442e2546f7da5e9dd891270.png ../_images/e755f0dbe0b1b1ea8df7492e8be440f16a680d69edaf874fc9f753b3a3e911c2.png ../_images/d9dc6b4778027770300a87141b1017efff72fda6b8b6b307ef65fe0b5c84f1ce.png ../_images/c595544665610e0693ff7b1678f37f37d76820b80ec301251420b2089b9055ac.png ../_images/6c06b95ba5e991b3874ff0613ce9e86ef181760d89e4a1e22d921f23ef642c61.png ../_images/5e8f4f9bd6f542f5dcfd72c7845b57563feca8f69b91e907a384e7677ed7948f.png

Per risk group (AAML1831 COG)#

AML Epigenomic Risk

Hide code cell source
for dataset, trial in zip([df_px2],['Discovery']):

    risk_groups = ['High', 'Low', 'Standard']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name=model_name,
            df=dataset[dataset['Risk Group AAML1831'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group} Risk',
            figsize=(8, 8))
Hide code cell output
../_images/9da02f93fd4730f570a64dae8c5d1e13933df69364b387530db4130388c20bf1.png ../_images/c813fb52930b19beac5de887577036247d79fb3d2e9b6e770a6c8ae77473cd4c.png ../_images/e64b31dbecd98959ed92b52e9ab6fbb35f1d301fa3ae7ea171d170bbc4a93968.png

MethylScoreAML-37CpGs

Hide code cell source
for dataset, trial in zip([df_px2],['Discovery']):

    risk_groups = ['High', 'Low', 'Standard']
    for risk_group in risk_groups:
        draw_kaplan_meier(
            model_name='MethylScoreAML Categorical',
            df=dataset[dataset['Risk Group AAML1831'] == risk_group],
            save_plot=False,
            save_survival_table=False,
            add_risk_counts=False,
            trialname=f'{trial} {risk_group} Risk',
            figsize=(8, 8))
Hide code cell output
../_images/9eb48ef3261f9fd02240ddfbd490ba9c4c8ae97fb20675685a955ab92953eff0.png ../_images/ce6185cb95d634507a3d670c4f432249e37b0a2ad497acd906267ea038702792.png ../_images/2b0a09b672479aa4bb3d57fea0bd71ef7d4d5b6731e0ecd78f4a9fb542b17538.png

Forest Plots#

With MRD 1 and BM blast (%)#

AML Epigenomic Risk

Hide code cell source
for dataset, trial in zip([df_px2, df_test], ['Discovery', 'Validation']):
    
    df_ = dataset.copy()
    df_['BM leukemic blasts (%)'] = pd.cut(df_['BM leukemic blasts (%)'], bins=[0,50,100], labels=['≤50', '>50'])
    df_['AML_Epigenomic_Risk'] = df_['AML Epigenomic Risk']
    df_['MethylScoreAML_Categorical'] = df_['MethylScoreAML Categorical']
    df_['os_time_5y'] = df_['os.time at 5y']
    df_['os_evnt_5y'] = df_['os.evnt at 5y']
    df_['efs_time_5y'] = df_['efs.time at 5y']
    df_['efs_evnt_5y'] = df_['efs.evnt at 5y']

    draw_forest_plot_withBMblast(time='os_time_5y',
                        event='os_evnt_5y',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)

    draw_forest_plot_withBMblast(time='efs_time_5y',
                        event='efs_evnt_5y',
                        df=df_,
                        trialname=trial,
                        model_name='AML_Epigenomic_Risk',
                        save_plot=False)
Hide code cell output
../_images/47e595986fd4b622aa49ad0e60b40d528288b58699943631d7f4bdf43bed876a.png ../_images/7cae030c0b2a8452971962b43832d5928daa1d4918b9e64a28e5ce7a37309bd8.png ../_images/6b94172ee6a4875b7d82fe892c5b764d3a725837fa1ade8139d816050a14f83c.png ../_images/993c85e4dfaa01b33458b31ce3dd61b64ce14daeb2d99f0349d39cf069e3c798.png

MethylScoreAML-37CpGs

Hide code cell source
for dataset, trial in zip([df_px2, df_test], ['Discovery', 'Validation']):


    draw_forest_plot_withBMblast(time='os_time_5y',
                        event='os_evnt_5y',
                        df=df_,
                        trialname=trial,
                        model_name='MethylScoreAML_Categorical',
                        save_plot=False)

    draw_forest_plot_withBMblast(time='efs_time_5y',
                        event='efs_evnt_5y',
                        df=df_,
                        trialname=trial,
                        model_name='MethylScoreAML_Categorical',
                        save_plot=False)
Hide code cell output
../_images/bff571f2b9a8caccde5bf9c3a1041a2a6432a19eeb2336b9e5205bb404c2f9ef.png ../_images/b5f74b8f73867ed753aa21eaae23816d7a463bb2c70dea50ed224957fca0b684.png ../_images/dcc493d7779bf274d26569f8dcc8c74a721288046364cb498723daa4f3feae6b.png ../_images/a289f6dda356e4673713da127781f8dbc36b913fb4ef309999e72c2bd5ff76f4.png

ROC AUC performance#

Diagnostic Model#

Hide code cell source
df_dx_auc_train, df_dx_dummies_train = process_dataset_for_multiclass_auc(df_dx)
df_dx_auc_cog, df_dx_dummies_cog = process_dataset_for_multiclass_auc(df_px2)
df_dx_auc_test, df_dx_dummies_test = process_dataset_for_multiclass_auc(df_test)
                                                                        
p1 = plot_multiclass_roc_auc(df_dx_auc_train, df_dx_dummies_train.columns, title='Discovery')
p2 = plot_multiclass_roc_auc(df_dx_auc_cog, df_dx_dummies_cog.columns, title='Discovery COG peds AML')
p3 = plot_multiclass_roc_auc(df_dx_auc_test, df_dx_dummies_test.columns, title='Validation')

# Create a gridplot
p = gridplot([
    [p1, p2, p3,],
    ], toolbar_location='above')

show(p)
Hide code cell output

Prognostic models#

Discovery#

Hide code cell source
df_cat = df_px2[['os.evnt at 5y', 'MethylScoreAML Categorical', 'AML Epigenomic Risk']]
df_cont = df_px2[['os.evnt at 5y', 'MethylScoreAML', 'P(Death) at 5y']]

df_cont = df_cont.rename(columns={'P(Death) at 5y':'AML Epigenomic Risk (PaCMAP-LGBM)',
                                  'MethylScoreAML': 'MethylScoreAML (EWAS-CoxPH)'})

df_cat = df_cat.rename(columns={'AML Epigenomic Risk':'AML Epigenomic Risk (PaCMAP-LGBM)',
                                  'MethylScoreAML Categorical': 'MethylScoreAML (EWAS-CoxPH)'})

risk = df_px2[['Risk Group AAML1831','Risk Group']]

low_high_dict = {'Low': 0, 'Low Risk': 0,
                'Standard':0.5, 'Standard Risk': 0.5,
                'High': 1, 'High Risk': 1}

risk['Risk Group'] = risk['Risk Group'].map(low_high_dict)
risk['Risk Group AAML1831'] = risk['Risk Group AAML1831'].map(low_high_dict)

df_cat['AML Epigenomic Risk (PaCMAP-LGBM)'] = df_cat['AML Epigenomic Risk (PaCMAP-LGBM)'].map(low_high_dict)
df_cat['MethylScoreAML (EWAS-CoxPH)'] = df_cat['MethylScoreAML (EWAS-CoxPH)'].map(low_high_dict)

df_cont_risk = df_cont.join(risk)
df_cat_risk = df_cat.join(risk)

df_cont_risk = df_cont_risk.fillna(0.5)
df_cat_risk = df_cat_risk.fillna(0.5)

def plot_roc_auc(df, target, title=None, color_option='colors1'):
    """
    Plots ROC AUC flexibly using Bokeh.

    """
    if color_option == 'colors1':
        colors = ['red','green','blue', 'orange', 'purple', 'brown', 'pink', 'gray', 'olive', 'cyan', 'black']
    elif color_option == 'colors2':
        colors = ['green','blue','red']
    else:
        colors = ['green','red','blue']
    
    if title:
        title_ = title + ', n=' + str(len(df))
    else:
        title_ = ''

    p = figure(title=title_,
               x_axis_label='False Positive Rate',
               y_axis_label='True Positive Rate',
               width=425, height=425,
               tools='save,reset,pan')
    
    p.line([0, 1], [0, 1], line_dash="dashed", color="gray", line_width=1)

    for column, color in zip(df.columns.difference([target]), colors):
        fpr, tpr, _ = roc_curve(df[target], df[column])
        roc_auc = auc(fpr, tpr)
        p.line(fpr, tpr, legend_label=f"{column} ({roc_auc:.2f})",
               color=color, line_width=2, alpha=0.8)


    p.legend.location = "bottom_right"
    p.legend.click_policy="hide"
    p.toolbar.logo = None
    p.legend.label_text_font_size = '8pt'
    p.legend.spacing = 2
    p.xaxis.axis_label_text_font_style = "normal"
    p.yaxis.axis_label_text_font_style = "normal"
    p.legend.background_fill_alpha = 0.8
    p.title.text_font_size = '10pt'

    return p

p1 = plot_roc_auc(df_cont_risk, 'os.evnt at 5y',title= 'Continuous (prob. of high risk)')
p2 = plot_roc_auc(df_cat_risk, 'os.evnt at 5y',title= 'Categorical (high-low risk)')

# Create a gridplot
p = gridplot([[p1, p2]], toolbar_location='above')

show(p)
Hide code cell output

Validation#

Hide code cell source
df_cat = df_test[['os.evnt at 5y', 'MethylScoreAML Categorical', 'AML Epigenomic Risk']]
df_cont = df_test[['os.evnt at 5y', 'MethylScoreAML', 'P(Death) at 5y']]

df_cont = df_cont.rename(columns={'P(Death) at 5y':'AML Epigenomic Risk (PaCMAP-LGBM)',
                                  'MethylScoreAML': 'MethylScoreAML (EWAS-CoxPH)'})

df_cat = df_cat.rename(columns={'AML Epigenomic Risk':'AML Epigenomic Risk (PaCMAP-LGBM)',
                                  'MethylScoreAML Categorical': 'MethylScoreAML (EWAS-CoxPH)'})

risk = df_test[['Risk Group']]
risk['Risk Group'] = risk['Risk Group'].map(low_high_dict)

df_cat['AML Epigenomic Risk (PaCMAP-LGBM)'] = df_cat['AML Epigenomic Risk (PaCMAP-LGBM)'].map(low_high_dict)
df_cat['MethylScoreAML (EWAS-CoxPH)'] = df_cat['MethylScoreAML (EWAS-CoxPH)'].map(low_high_dict)

df_cont_risk = df_cont.join(risk)
df_cat_risk = df_cat.join(risk)

# Rename `Risk Group` to `Risk Group AML02,08`
df_cont_risk = df_cont_risk.rename(columns={'Risk Group':'Risk Group AML02-08'})
df_cat_risk = df_cat_risk.rename(columns={'Risk Group':'Risk Group AML02-08'})

p1 = plot_roc_auc(df_cont_risk, 'os.evnt at 5y',title= 'Continuous (prob. of high risk)')
p2 = plot_roc_auc(df_cat_risk, 'os.evnt at 5y',title= 'Categorical (high-low risk)')

# Create a gridplot
p = gridplot([[p1, p2]], toolbar_location='above')

show(p)
Hide code cell output

Sankey plots#

Note

Sankey plots below compare the distribution of categories. The width of the lines is proportional to the number of patients in each group.

Samples with annotated diagnosis info#

Hide code cell source
colors = get_custom_color_palette()


draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title='Discovery cohort', fig_size=(4, 11),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_px2, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Validation cohort',fig_size=(3, 7),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/c036ec87888c80bcf045faf98c95af13aeed6dfb1cfd8242c6864b644e558dde.png ../_images/ad12c6a541da396dcc4b5f86d38025e0ca90d5fd0b9bbb5319c43e943c01ba65.png ../_images/7008b1bfbaa611b90604a854d8580f39257313064ffd1da35e4e11da4ca4b597.png

Predictions in samples for which no WHO 22 Dx data was available#

Hide code cell source
draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title='Discovery cohort', fig_size=(4, 9),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_px2, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 8),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'AL Epigenomic Subtype', colors,
                 title= 'Validation cohort',fig_size=(4, 8),
                 fontsize=8, nan_action='keep only')
Hide code cell output
../_images/064aa240a7423fde999fd30f6dc21144fe9f144035c901120fe79d8bcd160484.png ../_images/e4bcf7a859947e866a25d04572c4693970135792b3431434abd3f7dfe3aa0606.png ../_images/e57631f1043106c4aad4740122801ae56648a733b371e751cd6ad86c56abb192.png

Reason for unclassified samples#

Hide code cell source
draw_sankey_plot(df_train, 'WHO 2022 Diagnosis', 'Primary Cytogenetic Code', colors,
                 title='Discovery cohort', fig_size=(4, 6),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_px2, 'WHO 2022 Diagnosis', 'Gene Fusion', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(4, 9),
                 fontsize=8, nan_action='keep only')

draw_sankey_plot(df_test, 'WHO 2022 Diagnosis', 'Primary Cytogenetic Code', colors,
                 title= 'Validation cohort',fig_size=(2, 3),
                 fontsize=8, nan_action='keep only')
Hide code cell output
../_images/584ba3f4b41a1e45c1c7bb6a345f01ccb03d7074e703e429f061e3bb1cd71c4b.png ../_images/a1c14d2710903b6bbb195c5b3eaedc0b82d2ab71efbbdce09a88444571dcf3b6.png ../_images/9ba84bfe4896f295efbab9b157fb4b0079d165caf9010d92affdcd654ee5a0bd.png

Risk group comparison in COG#

Hide code cell source
draw_sankey_plot(df_px2, 'Risk Group', 'Risk Group AAML1831', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(2, 4),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_px2, 'Risk Group AAML1831', 'AML Epigenomic Risk', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(2, 4),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/f83982441470711056fc53790dbdbb61af52b56f33d6c14a949bf7d6d755015a.png ../_images/43719716bd83d4af80d70f7667dd9b1d697686b6c9c590872987343bf156124c.png

Px and Dx model comparison#

Hide code cell source
draw_sankey_plot(df_train, 'AML Epigenomic Risk', 'AL Epigenomic Subtype', colors,
                 title='Discovery cohort', fig_size=(3, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_px2, 'AML Epigenomic Risk', 'AL Epigenomic Subtype', colors,
                 title= 'Discovery cohort (COG peds AML Dx samples only)',fig_size=(3, 10),
                 fontsize=8, nan_action='drop')

draw_sankey_plot(df_test, 'AML Epigenomic Risk', 'AL Epigenomic Subtype', colors,
                 title= 'Validation cohort',fig_size=(3, 8),
                 fontsize=8, nan_action='drop')
Hide code cell output
../_images/6e0922dd5f68cf915badc658d8df596d15eb1bf3fa17f48fdc896d6eea81faeb.png ../_images/aadfa57f35d8f3cb3ab6dd7dfaeac8b73baef47e9c86cf864e2bdba1afea5291.png ../_images/36e8f321d0899629eefaf45ca1d10bf879bc0370d53f2d86bd66bab0f1a8fea1.png

Watermark#

Author: Francisco_Marchi@Lamba_Lab_UF

Python implementation: CPython
Python version       : 3.10.13
IPython version      : 8.20.0

pandas         : 2.2.0
seaborn        : 0.13.2
matplotlib     : 3.8.2
tableone       : 0.8.0
sklearn        : 1.4.0
lifelines      : 0.28.0
statannotations: not installed

Compiler    : GCC 11.4.0
OS          : Linux
Release     : 5.15.133.1-microsoft-standard-WSL2
Machine     : x86_64
Processor   : x86_64
CPU cores   : 32
Architecture: 64bit